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States are redesigning their educator effectiveness systems to provide more information 
and more support to improve teaching. In the process, they increasingly look beyond the 
most basic and historically most common view of measuring student performance: how 
many students “passed” the State test in a given year. A number of States now require an 
objective measure of student growth to be part of teacher evaluations.' 


States are using student growth measures to understand teacher effectiveness for good | 
reasons. First, student learning is the most important expectation we set for schools, and 

nothing in a school impacts student learning more than effective teaching.” 

Second, new data systems permit far better links between student outcomes (tests, 

graduation, postsecondary experiences) and specific schools and teachers. This facilitates 


assessment of and systemic learning about changes to policy and practice that might 
lead to improvements in the quality of teaching and public schools. 


Finally, traditional methods of evaluating teachers that typically do not include objective 
measures of teacher performance have in most state education agencies (SEAs) and 
local educational agencies (LEAs) provided inadequate information about teacher 
effectiveness. In particular, these methods tend to yield high ratings for almost all 
teachers, and consequently these ratings have little value in predicting either future 
teacher effectiveness or student achievement.’ As a result, they have yielded little 
information that can help teachers become more effective practitioners. 


This brief describes various approaches to measuring student growth and what research 
says about the extent to which student growth may be used as a measure of teacher 


performance. 
Approaches to Measuring between the statistical methods described below 
Teacher Performance is the way in which researchers, SEAs and LEAs 
form expectations or predictions about student 
There are a number of different methods SEAs and achievement. Once they create predictions about 
LEAs can use to translate student test achievement how students will do, they attribute the aggregated 
into a measure of teacher performance. All of these difference (across students in a classroom) between 
involve predicting how well students in a teacher's these predictions and actual student achievement, 
classroom will perform on tests and contrasting this at least to some degree, to teachers. This forms the 


to their actual performance. The main difference basis of a teacher performance measure. 


The Reform Support Network, sponsored by the U.S. Department of Education, supports the Race to 
the Top grantees as they implement reforms in education policy and practice, learn from each other, 
and build their capacity to sustain these reforms, while sharing these promising practices and lessons 
learned with other States attempting to implement similarly bold education reform initiatives. 


An increasing amount of literature explores the extent 
to which different methods of predicting student 
achievement and then aggregating these across 
students in a classroom results in accurate measures of 
teacher performance.‘ This brief touches on this issue, 
but its primary purpose is to describe different models, 
not assess their validity. 


In general, models that take into account where 
students start before some educational intervention or 
receiving instruction from a particular teacher add an 
important dimension to understanding learning and 
the contributions that schools and teachers make to 
learning. Most of this brief offers an overview of some 
of the more common of these models. It concludes 
with a discussion of the importance of using multiple 
measures to identify teacher performance.® 


Value-Added Models 


Value-added models (VAMs) are a class of models 
that measure student test achievement against some 
prediction of how students are expected to do given 
their earlier achievement level and, depending on the 
specific model, other factors thought to both influence 
student learning and reside outside the control of 
teachers and schools.’ Educational researchers have 
long used the value-added framework to address 
questions about the efficacy of different interventions 
and the effects of different levels of school resources, 
such as class size.° 


SEAs and LEAs can use VAMs to help answer questions 
about actual performance in light of expected results: 
Is the learning this student demonstrated on this year’s 
test greater or less than would be expected, in light of 
the performance of other students with similar prior 
achievement and similar backgrounds? 


As part of teacher evaluation systems, VAMs aim to 
predict what student growth can be expected from an 
average or typical teacher, and then compare actual 
student achievement with that prediction. A teacher's 
value-added score is intended to convey how much 
individual teachers contribute to student learning in 

a particular subject in a particular year. Teachers who 


produce more than this typical teacher are thought 
to have added value. Teachers whose effects on 
students result in less growth than the typical teacher 
is expected to yield are considered less effective. 


VAM measures of teacher performance differ 
according to the particular VAM used because models 
differ in terms of how they adjust for student and out- 
of-school factors that influence achievement and the 
way in which they compare teachers. Some models, 
for instance, predict only student achievement based 
on prior test scores, while others include controls for 
factors such as a student's race and ethnicity, eligibility 
for free or reduced-price lunch, and so on. And teacher 
performance, for example, may be judged relative to 
other teachers in the same school or relative to a larger 
set of teachers, such as those in a LEA or a whole state. 
The differences between models are sometimes small, 
but can also have meaningful impacts on estimates 

of teacher performance, particularly for teachers who 
are serving students with backgrounds that differ from 
those in an average classroom.’ 


Gain Score Model 


Many educators and policymakers are familiar with 
one VAM, the Gain Score model. This model measures 
changes in individual student achievement between 
two or more points in time, for example from the 
beginning to end of a school year or from one 
administration of an annual test to the next, but does 
not include any statistical adjustments for the type 

of students served or the resources that schools or 
teachers have on hand. 


A virtue of gain scores is that they are easy to calculate 
in that they simply entail taking achievement for an 
individual student in a particular grade and subject and 
subtracting, for instance, the score from achievement 
in the same subject in the prior grade. Measuring 
student growth in this way is not new. Teachers and 
researchers have long used pre- and post-test designs 
to understand learning over time and gauge how 
much new knowledge students gain as a consequence 
of, for instance, an intervention or classroom lesson. 
This model helps answer questions about the amount 
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of learning that has taken place: How much did this 
student learn last year? How much, on average, did 

this teacher's students’ performance change over the 
course of this school year? How much did this group of 
eighth-graders learn compared with all eighth-graders 
in the State? 


To explore the progress made by students in a 
particular classroom assigned to a specific teacher, 

a growth/gain model typically averages the gains 
across the class and compares that average with 

those of other teachers. Were this average gain used 

as a measure of teacher performance, the implicit 
prediction assumption is that students would all have 
equal achievement gains in the absence of differences 
among teachers: Differences in average gains represent 
what is attributed to teachers. 


Student-Growth Percentile Model 


Users of the Student Growth Percentile (SGP) model 
want it to facilitate a comparison of learning from one 
grade to the next or one test to the next when the 
desire is to assess how student performance compared 
with other students with a similar prior achievement 
level. The SGP model uses a statistical procedure 

called “quantile regression” that calculates where in the 
achievement distribution a student falls relative to other 
students with a similar prior test score history.’ Thus, 
for example, a student who has a growth percentile of 
75 in the 4th grade had test achievement growth from 
the 3rd to the 4th grade that equaled or exceeded 75 
percent of students who started with a similar prior 
achievement level in the 3rd grade. 


The SGP model can answer certain questions, such as: 
Did this student learn as much this year as she learned 
last year? Did this student learn as much in math as he 
learned in reading? Did these students learn as much as 
their peers in our LEA? Which program or instructional 
approach resulted in the most student learning? 


When examining results aggregated across students, 
it is typical to calculate an average or median of all the 
student growth percentiles at the level of interest—for 
instance, a classroom, school or LEA. Using the SGP 
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model, States also can set a target for adequate growth, 
for example, the growth needed to reach or maintain 
proficiency. 


Value Added: What the Research Says 


Since Tennessee pioneered use of a VAM in its State 
accountability system in the early 1990s," student 
growth models and VAMs have become more refined 
and sophisticated. The use of these models to gauge 
learning and the impact of specific teachers continues 
to develop and evolve. A growing body of research 
shows that this kind of analysis contributes valuable 
information to understanding teaching and learning. 


Information about student achievement and 
teacher contributions to student results 


VAMs do not predict or attribute student achievement 
with pinpoint accuracy because not everything about 
student achievement can be predicted easily and 
explained with the data that is typically at hand. Still, 
research has found that value-added measures predict 
future student achievement better than other factors 
commonly used for important personnel decisions 
about teachers.’ Used with other measures, VAMs can 
increase understanding about teacher practice and its 
connection to student learning.”’ 


Research has found that these models hold much 
promise in using student test results to make reasonably 
reliable inferences about teacher effectiveness." 
Moreover, the results from VAMs match closely 
principals’ evaluations of the most effective and least 
effective teachers.” 


A role for multiple measures of teacher 
effectiveness 


Although research shows that growth models and 
VAMs shed important and unique light on teacher 
effectiveness, studies also suggest that SEAs and LEAs 
should be cautious about basing high-stakes decisions 
solely on these models. Researchers and practitioners 
still disagree about whether and how student test- 
based measures of teacher performance should be 


used. In addition, test-based measures alone do 
not provide teachers with timely feedback on their 
practices. 


SEAs and LEAs should consider a number of limitations 
to VAMs. The statistical techniques may not account 
fully for the fact that students are generally not 
randomly assigned to teachers. Without documenting 
and modeling the particular ways that schools 
distribute teachers in classrooms, some experts argue it 
is difficult to fully account for how those choices affect 
the effects attributed to each teacher's skills, efforts and 
success.'° 


The realities of testing—students guess both correctly 
and incorrectly, students come to school with the 

flu on testing day—could also fog the picture of 
effectiveness painted by the results and lead to the 
inaccurate classification of teachers. Variation in the 
value teachers are found to add from year to year 

also has raised questions about the models. This 
leaves some wondering if it is plausible for teacher 
effectiveness to vary so greatly. The Brown Center on 
Education Policy at Brookings has noted, however, 
that these year-to-year variations are consistent with 
annual job performance measures in other fields. They 
also resemble the variations between SAT scores and 
first-year college grade-point averages—and most feel 
comfortable about using SAT scores to make high- 
stakes decisions about students’ college readiness.” 
And, thus, while it is vital to understand limitations 
and to use results carefully, growth models and 

VAMs improve on teacher effectiveness systems that 
already are notoriously inaccurate: SEAs and LEAs have 
routinely deemed all but a relative handful of teachers 
to be effective.'® 


LEAs and SEAs are overcoming these challenges as 
VAMs grow more common and more sophisticated. 
Using several years of data may make the estimates 
of teacher effectiveness more stable; the models’ 
reliability increases with additional years of data 
(particularly up to three years of results).’? The data 
systems that LEAs and States rely on are also getting 
better at identifying which students are assigned to 
which teachers for different types of instruction. 


Still, given some of the limitations of student test-based 
measures of teacher effectiveness, there is an emerging 
consensus that SEAs and LEAs would be well-served 

to use multiple measures to arrive at a summative 
assessment of teacher performance. Even proponents 
who contend that “value-added is superior to other 
existing methods of classifying teachers,’”° suggest that 
teacher evaluation should have many facets. 


The Measures of Effective Teaching (MET) project 
funded by the Bill and Melinda Gates Foundation 
recommends that classroom observations, student 
achievement gains and student survey feedback be 
used together as a set of multiple measures to evaluate 
teachers. MET researchers hold that the combination 

of these multiple measures increases the ability to 
predict future student achievement, improves reliability, 
and provides richer diagnostic feedback that teachers 
can use to improve. The MET project has found that 
teachers who demonstrated greater effectiveness 

in classroom observations had higher student 
achievement gains than other teachers. However, 
classroom observations alone did not predict student 
achievement as reliably as observations, combined with 
student feedback and achievement gains. 


Conclusion 


As SEAs and LEAs introduce more comprehensive 
educator effectiveness systems that include measures 
of growth in learning and value added, they face 
many technical issues. Beyond the challenge of 
measuring growth in non-tested grades and subjects, 
States must address the quality of the assessments 
they use, including whether they can link test results 
vertically, from grade to grade. States must contend 
with issues of awareness and understanding as well. 
The more complex questions they seek to answer, 
the more complex and less transparent the model for 
arriving at the answers become. 


Although many laypeople can relatively easily 
calculate gain-scores, it is more complex for teachers 
or the public to replicate the results from value-added 
or SGP models given the statistical expertise required 
to do so, the size of the data sets they marshal and 


concerns about student privacy when exploring how 
academic peers perform. States are well served to go 
to great lengths to explain their models and to make 
them as transparent as possible. 


Shifting to growth models and VAMs is a significant 
step SEAs and LEAs have taken to improve their data 
systems in recent years. SEAs’ work to build high- 
quality data, longitudinal capacity and the ability to 
match students and teachers to each other and to 
student results all make the use of the models more 
feasible.”" 


VAMs and growth models are new and rely on data 
systems that SEAs and LEAs are still building, but 
preliminary research shows that “combining new 
approaches to measuring effective teaching—while 
not perfect—significantly outperforms traditional 
measures.’** Used as part of a body of evidence 
collected to measure student growth and teacher 
effectiveness, VAMs and growth models hold promise 
for helping policymakers and practitioners collect 
and analyze sophisticated data on teaching and 
learning that can guide professional development, the 
distribution of human resources and decisions about 
career milestones within the nation’s schools. 


Yet SEAs and LEAs can use value-added approaches 
only when they consistently use standardized tests 
across several grades and subjects. These conditions 
exist in most States testing third-grade through 
eighth-grade mathematics and reading/English 
Language Arts, and high school mathematics and 
English language arts. But in most LEAs, testing 
programs provide data to measure the effectiveness 
of less than half of the teachers. Measuring learning in 
non-tested grades and subjects presents challenges 
to using growth models and VAMs. The Reform 
Support Network has begun to address these 
challenges through a seminar on measuring student 
growth in non-tested grades and subjects, a student 
learning objective work group of Race to the Top 
states and other publications, including a guide on 
student learning objectives. 
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